Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Tests for Metrics API enhancement to include error counters #7423

Merged
merged 13 commits into from
Jul 12, 2024

Conversation

indrajit96
Copy link
Contributor

@indrajit96 indrajit96 commented Jul 8, 2024

What does the PR do?

Currently We don't have any mechanism to count any failures that happen in triton core before the request gets to the backend.
Enhances existing metrics API to incorporate failure and failure counts in inference requests.
This PR adds new test

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

triton-inference-server/core#377

Where should the reviewer start?

RespondIfError function call in core

Test plan:

Test for REJECTED
Modified qa/L0_model_queue/model_queue_test.py to capture rejections in Ensemble Model and Non Ensemble Model in dynamic batcher.
custom_zero_1_float32 model expected to have 4 request rejects

Test for OTHER
Modified qa/L0_model_queue/model_queue_test.py to capture OTHER errors in Ensemble Model.
ensemble_zero_1_float32 expects to have 2 request rejected reported as OTHER.

Test for CANCELED
Modified qa/L0_request_cancellation/scheduler_test.py to capture various cancellations.
_test_sequence_batch_scheduler_queued_request_cancellation test cancel 2 requests, expected to be reported as 2 in test_direct_sequence_batch_scheduler_request_cancellation() and test_sequence_batch_scheduler_backlog_request_cancellation()

Test for BACKEND
Modified qa/L0_backend_python/lifecycle/lifecycle_test.py which generates errors from backend inside test_infer_pymodel_error by using wrong model name.
Expect the error count to be 1.

Untested :

  • REJECTED request which gets triggered from SequenceBatchScheduler()
  • OTHER uncategorized failures for individual (non-ensemble) models
  • Follow-up ticket for further testing: TODO @indrajit96

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes Jira issue: DLIS-5530

@indrajit96 indrajit96 changed the title Tests for Metrics API enhancement to include error counters test:Tests for Metrics API enhancement to include error counters Jul 8, 2024
@rmccorm4 rmccorm4 changed the title test:Tests for Metrics API enhancement to include error counters test: Tests for Metrics API enhancement to include error counters Jul 11, 2024
Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM overall, will save further nitpicks since this is planned to be cherry-picked. Let's get some docs additions in metrics.md with this PR to explain this feature so that the docs update can also be cherry-picked together with the feature. We can then point to the new docs in the release notes.

@indrajit96
Copy link
Contributor Author

Screenshot 2024-07-11 at 10 31 34 AM
Newly added docs

@rmccorm4 rmccorm4 added PR: docs Documentation only changes PR: test Adding missing tests or correcting existing test labels Jul 11, 2024
@indrajit96 indrajit96 merged commit 70a0eee into main Jul 12, 2024
2 checks passed
@indrajit96 indrajit96 deleted the ibhosale-metrics branch July 12, 2024 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: docs Documentation only changes PR: test Adding missing tests or correcting existing test
Development

Successfully merging this pull request may close these issues.

2 participants